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(54) Title: INTERACTIVE SYSTEM FOR TEACHING SPEECH PRONUNCIATION AND READING 



(57) Abstract 

A system for teaching speech pronunciation or reading to a student, has 
a memory (10) for storing a plurality of speech portions; a playback system 
(112, 18), associated with the memory, for indicating to a student (USER) a 
speech portion to be practiced; an algorithm associated with the memory and 
the playback system; a speech portion selector (20) for selecting a speech 
task (22, 24) to be practiced; and a sound recorder apparatus (16) operatively 
connected to the algorithm and operative to sense and record a sound uttered 
by a student, and to provide the utterance to the algorithm in signal form, 
wherein the speech recognition algorithm (30, 40) is operative to compare 
the utterance with the speech portion to be practiced and to evaluate the 
accuracy of the utterance; and the playback system being operable provide 
to the student an indication of the accuracy of the utterance (64, 68). 
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WO 99/13446 PCT/IL98/00426 
INTERACTIVE SYSTEM FOR TEACHING SPEECH PRONUNCIATION & READING 



FIELD OF THE INVENTION 
The present invention relates to speech teaching, generally, and. in particular, to 
a software-based system for the teaching of correct speech and reading. 

BACKGROUND OF THE INVENTION 

It has long been sought to provide ways of teaching correct pronunciation of a 
particular language, inter alia, for the purpose of learning correct pronunciation of a 
foreign language, for general speech therapy. It has also been sought to provide a tool 
for evaluation and guidance of one's reading of a native or foreign language. 

Known art includes the following patent publications: 

US Patent No. 4,636,173 which discloses a method of teaching reading by 
synchronizing a visual display with a soundtrack by momentarily highlighting displayed 
words as they are emitted by the soundtrack. 

US Patent No. 5,142,657 which relates to a computerized system for providing a 
visual output of analyses of speech including the parameters of waveform, power, pitch 
and sound spectrograph, and comparing these parameters with corresponding model 
parameters. 

US Patent No. 5,286,205, which relates to a method of teaching spoken English 
using mouth position characters. This method is based on a visual display of mouth 
positions required for different pronunciations. 

US Patent No. 5,393,236 which describes a computer-based interactive speech 
pronunciation apparatus and method. 

US Patent No. 5487671 relates to a computerized system for teaching speech 
that evaluates accuracy of pronunciation relative to a stored database according to one 
or more speech parameters. 

US Patent No. 5,503,560 which relates to a computerized system for speech 
training, in which a user is prompted in the pronunciation of keywords. The system 
records a first attempt at pronunciation of a keyword and compares subsequent attempts 
with the first attempt. An improvement in pronunciation is claimed to be correlated with a 
significant deviation in user's speech template. There is also provided a display which 
shows a required mouth shape for the sounds to be learned. A video analysis of the 
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user's actual mouth positions may also be provided by use of a video pick-up and 
analyzer. 

Published PCT application no. WO 91/00582 which relates to a system which 
compares pronunciation of a word or sentence with a reference word or sentence, and 
which provides audio and video displays of the comparison. 

The above patent publications are characterized by various disadvantages, as 
follows: 

US Patent No. 4,636,173 discloses a method which is not interactive, and thus 
does not provide any indication to a student as to the accuracy of his pronunciation, nor 
does it indicate a way of achieving correct pronunciation. 

US Patent No. 5,142,657 provides a computerized method which does not 
provide an easily interpretable feedback, and does not provide an explanation of how to 
improve pronunciation, merely which parameters of speech need to be improved. 
Furthermore, a display of these parameters, while they may be suitable for expert users 
in language laboratory, will not be helpful to less skilled students or children. 

US Patent No. 5,286,205 relates to a teaching method which is not interactive, 
such that a student has to judge for himself whether or not his pronunciation is correct, 
there being no objective feedback thereof. Furthermore, the method teaches use of 
different mouth positions, and cannot therefore be used for all sounds for which the 
mouth position is not the only important key to correct pronunciation. 

US Patent No. 5,393,236 relates to a computer-based interactive speech 
pronunciation method which is not self-sufficient and which requires supervision by an 
instructor and, moreover, does not in any way guide user toward correct pronunciation. 

US Patent No. 5487671 relates to a computerized system for teaching speech 
that evaluates accuracy of production relative to a stored database according to one or 
more speech parameters. It does not provide to the user an indication as to how to 
improve his pronunciation, nor does it point to the user the nature of his mistakes, nor 
does it provide an algorithm that allows the evaluation of a given pronunciation, nor does 
it provide a methodology for dealing with pronunciation mistakes at various levels. 

US Patent No. 5,503,560 relates to a computerized system that judges 
improvement in pronunciation according to a deviation in user's own voice, but it does 
not directly compare user's pronunciation to that of native speakers of the language, nor 
does it direct the user as to how to improve his pronunciation, nor does it point out to the 
user the mistakes made within the phrase or keyword. 
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Published PCT application no. WO 91/00582 describes a system which does not 
provide any indication of how to achieve correct pronunciation. 

In general, known methods do not enable a student either to learn correct 
pronunciation of parts of speech or to learn how to read, such as when the student is a 
child being taught to read in his native language, wherein the feedback is based on 
totally objective criteria, and is totally interpretable by and thus immediately useful to a 
student without requiring interpretation or guidance by an instructor. 



SUMMARY OF THE INVENTION 

It is thus an aim of the present invention to provide a fully interactive, 
self-contained system for teaching pronunciation of language sounds. This system may 
also be used for teaching a person how to read in his native language. In particular, a 
speech recognition algorithm is provided so as to enable full interaction between a 
student and the system, in real time. 

In particular, the software of the invention is employed in the system such thayn 
response to selected utterances, one or more visual stimuli, such as one or mo/e 
moving images on a visual display unit are activated in a desired manner. An utterance 
which is not sufficiently accurate activates the stimulus, but not in the desired manner. 
Instruction is provided by way of displaying the correct tongue position inside the mouth, 
also known as articulator/ positioning. 

<r 

As will be appreciated from the description hereinbelow, the system of the 
invention operates both at the level of the individual phoneme, and also at the level of 
multi-phoneme strings, such as words and phrases. 

There is thus provided, in accordance with a preferred embodiment of the 
invention, a system for teaching speech pronunciation or reading to a student. The 
system includes a memory for storing a plurality of speech portions, a playback system, 
associated with the memory, for indicating to a student a .speech portion to be practiced, 
and a speech portion selector for selecting a speech task to be practiced. 

The system is operated via an algorithm which is associated with the memory, the 
playback system, and also with a sound recorder which is operative to sense and record 
a sound uttered by a student, and to provide the utterance for processing by use of the 
algorithm, in signal form. The algorithm performs a comparison of the utterance with the 
speech portion to be practiced, evaluates the accuracy of the utterance, and provides 
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Further in accordance with a preferred embodiment of the present invention, the 
speech portion selector includes apparatus for selecting a phoneme in a selected 
phoneme class, and the algorithm is also operative to determine whether or not a 
phoneme present in the utterance belongs to the selected phoneme class. If a phoneme 
in the utterance is determined to be outside a selected phoneme class, then the 
utterance is 'rejected* as being inaccurate, and the student may be instructed to try 
again, if the system is operating at the single phoneme level. If the system is operating in 
the multi-phoneme string or word/phrase level, the student may be informed of the 
problematic phoneme or phonemes, referred to also herein as "subgroups" , and 
instructed to practice them before proceeding with the more complex task. 

Additionally in accordance with a preferred embodiment of the present invention, 
the playback system includes visual playback apparatus and audio playback apparatus, 
and, in response to selection of a selected speech portion by a student, the visual 
playback apparatus is operative to display a visual image indicating the speech portion 
selected, and the audio display apparatus is operative to provide an audible indication of 
the speech portion selected. 

Further in accordance with a preferred embodiment of the present invention, the 
playback system is operative to provide, preferably in real time, a dynamic visual image 
indicating the accuracy of the utterance. 

Additionally in accordance with a preferred embodiment of the present invention, 
the playback system includes apparatus for displaying a movable visual image which is 
movable between first and second locations on the display, wherein the first location is a 
start location at which the visual image is located prior to sensing of a sound by the 
sound recorder, and wherein the second location is a target location, towards which the 
playback system is operative to move the movable visual image in real time as an 
indication of the accuracy of the utterance. 

Further in accordance with a preferred embodiment of the present invention, the 
distance between the movable visual image and the target location is inversely 
proportional to the accuracy of the utterance. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will be more fully understood and appreciated form the 
following detailed description, taken in conjunction with th drawings, in which: 

4 



BNSOOCID: <WO 9913446A1J_> 



WO 99/13446 



PCT/IL98/00426 



Fig. 1 is a block diagram representation of an interactive speech pronunciation 
teaching system, constructed and operative in accordance with the present invention; 

Fig. 2 is a schematic representation of the sequence of events employed in the 
present invention to teach correct speech and pronunciation; 

Fig. 3A is a diagrammatic representation of a visual display "prompt" screen 
provided by the system to a student, in response to selection by the student of a 
particular word or phoneme; 

Fig. 3B is a diagrammatic representation of a prompt screen provided by the 
system, in response to an incorrect pronunciation of a subgroup within a multi-phoneme 
string, visually emphasizing the incorrectly pronounced subgroup; 

Fig. 3C is a diagrammatic representation of a real time, visual feedback prompt 
screen; and 

Fig. 4 is a flow chart of the methodology employed by the present invention to 
analyze the speech of a student and to provide visual feedback of the student's 
performance. 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention seeks to provide a computerized system for the teaching of 
correct speech, particularly by use of speech recognition algorithms. The object of the 
system is the interactive teaching of correct speech in a language, either foreign or 
native, as well as for speech therapy. The system may also be used to teach reading, 
particularly to a child, of his native language. As will be appreciated from the following 
description, the system utilizes a variety of techniques in order to achieve this task, 
including: speech recognition, speech evaluation; speech error detection; accent 
recognition; real time visual feedback; audio feedback; and articulator guidance. 

The system may be adapted for use with a variety of computer systems, and 
may, in a preferred embodiment of the present invention, be fully self-contained within, 
for example, a suitable multimedia personal computer. In accordance with other 
embodiments of the invention, the present teaching system may be adapted for use via 
any suitable multimedia-enabled computerized platform which may or may not be 
constructed specifically for the system of the invention, or the system may be based on a 
computer network, such as Internet, intranets and the like. 

Referring now to Fig. 1, the system based on a preferred embodiment of the 
present invention includes a computer 10 equipped with a sound card, a visual display 

5 



BNSDOCID: <WO 9913446A1_I_> 



unit (VDU) 12, typically, a high-speed color monitor, a manual data input unit 14, which 
may be a keyboard and/or a pointing device, such as a mouse, glide pad, or a 
touch-sensitive screen forming part of VDU 12, a microphone 16, and a speaker 18. The 
hardware components shown and described herein may be totally conventional, and 
thus, no further specific description thereof is necessary. 

Referring now to Fig. 2, there is shown, a schematic representation of the 
sequence of events in a typical speech and pronunciation teaching session with a 
student, according to a preferred embodiment of the present invention. Examples of 
"prompt" screens displayed to the student in such a session are shown in Figs. 3A 
through 3C. 

In a system implemented according to a preferred embodiment of the present 
invention, a typical speech and pronunciation teaching session includes the following 
sequence of events, shown schematically in Fig. 2: 

First (block 20): the student selects a "level" mode, typically including either a 
lesson containing plural phonemes, such as a word or phrase (block 22), or subgroups of 
such, or a single phoneme (block 24). 

It will be understood that a 'lesson 1 is a collection of production tasks having a 
common denominator. For example, in a production lesson for the phoneme /I/, the 
lesson plan would be words and phrases containing IM in various positions in the lesson 
words. 

Second (block 26): in the event that that word or phrase level has been selected, 
one or more words or sentences containing the lesson subject, exemplified as the word 
"SHELL," shown on prompt screen 60 in Fig. 3A, are presented to the student. 
Preferably, the word is also sounded by the system, so that the student hears the correct 
pronunciation, which he is then to repeat. 

In accordance with an alternative embodiment of the invention, the system may 
also be taught how to pronounce specific words or sounds, for example, so as to adapt it 
to a particular regional accent. In this case however, various default accents are retained 
in the system's memory so as to enable the system to be reset, if required. 

Third (block 28): the student repeats the lesson word or words into the 
microphone 16 (Fig. 1). 

Fourth (block 30) : the student's speech is analyzed and evaluated for errors in 
pronunciation; errors are indicated as by a display prompt, as seen on prompt screen 62 
in Fig. 3B, in which the subgroup, in this case the phoneme /I/, is indicated as having 
been mispronounced. 
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In the event that the student pronunciation was successful, such that the 
objective has been completed (block 32) (by a correct pronunciation of the selected 
lesson subject), he may then be returned either to the level select mode (block 20), or to 
the lesson select mode (block 22). 

In the event that the student pronunciation was not successful, it is determined 
whether his mispronunciation is "segmental," namely, relating to the 
phrase/word/phoneme levels or, "super-segmental." If the problem is super-segmental, 
i.e. it relates to stress or intonation, he is referred to a system dealing with that particular 
problem. That type of system is known in the art, and is thus beyond the scope of the 
invention, and is thus not dealt with herein. If the problem is determined to be segmental, 
then the student is transferred to the phoneme visual feedback and articulation mode 
(block 34), where he has the option of studying the inaccurately pronounced subgroup 
not only by imitation, but also by being shown the correct articulation, or required tongue 
positioning. In this mode, the system points out to the user the nature and location of his 
mistake. 

Accordingly, at this stage, the system provides the student with the option of 
replaying own audio recording, while at the same time, providing a visual display of the 
subject phoneme (block 36). The system may also replay a model audio recording, for 
purposes of comparison. 

The student then attempts to repeat the subgroup (block 38), which the system 
analyses and evaluates (block 40). If the student is unable to improve performance, th : e 
system enters visual feedback mode, indicated as "(audio)visual guidance and display 1 ^ 
block 40, which is shown and described herein in conjunction with Fig. 3C below. 

Once the student has improved performance in this mode, the system may return 
either to the word/phrase display level (block 26), or to the phoneme select level (block 
24), if he decides that single phonemes should be practiced and acquired prior to 
proceeding to word or phrase (subgroup) lessons. Otherwise, he is returned to a level 
whereat he practices the phoneme with which he is having trouble. 

By way of example, consider a case wherein the subject of the lesson is correct 
pronunciation of the /I/ sound. The student is shown an animation of the word "SHELL" 
60 on the visual display unit 12 (Fig. 1), as well as an appropriate icon (not shown) 
representing a shell. The system then plays back a model recording of "SHELL", and 
prompts the student to repeat it into the microphone 16. The student is provided with 
both visual and audio prompts. 
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When the student repeats the word, and, for example, mispronounces /I/, the 
system points out the error to the student, seen at 64 in FIG. 3B. This will be by some 
form of animation (not shown), as well as by an audio indication such as "you have 
mispronounced the /I/ in "shell." 

The student can choose either to try to pronounce the word again correctly, or to 
receive further guidance in the form of real time visual feedback and articulation. 

Real time visual feedback is based on the student's control of a targeting device 
appearing as on a prompt screen 66, seen in Fig. 3C. By use of a speech recognition 
algorithm, as described below, the system extracts predetermined relevant speech 
parameters from the student's rendition of a test phrase, word or phoneme, and 
transforms the student's performance into a distance from the appropriate target. In the 
example shown in Fig. 3C, the student is shown a prompt screen with /I/ as a target, the 
other target being the phoneme /r/. Different targets may also be provided, a 'default' 
target being the relevant 'mistake.' In other words, if a common mistake made when 
pronouncing /I/ is the phoneme /r/, then the default target, as seen in the drawings, will 
be /r/. 

There exists, however, the option, particularly when the system of the invention is 
used within a supervised setting, of a supervisor (clinician or teacher) adding, changing 
or removing 'target' points of reference. 

A targeting device, referenced 68, is also shown, being exemplified by a circle, 
which is initially positioned at a 'zero 1 position, over a pair of cross hairs 70 and 72. 

Each time the student repeats the phoneme /I/, the targeting device 68 moves 
closer to or further from the target phoneme /I/, wherein the displayed distance between 
the targeting device 68 and the target phoneme is inversely proportional to the perceived 
acoustic "distance," or accuracy of the pronounced phoneme. If the student pronounces 
the phoneme correctly, the targeting device 68 is moved into coincidence with the target 
phoneme. An animation or other entertaining event may also be shown by way of 
reward. 

In accordance with the present invention, a "correct" pronunciation is that whose 
extracted speech parameters are substantially the same as those of a database of 
recordings of that single sound, word or phrase, (which may also be referred to as a 
"multiple phoneme string"), adjudged to be well pronounced by a group of experts, such 
as speech therapists, or professional teachers of the language. 

In accordance with an alternative embodiment of the invention, however, the 
system may be 'taught' or adjusted online so as to a new definition of 'correct. 1 For 
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example, if a particular pronunciation is perceived to be good, even though the system 
judged it as "bad", the system may be adjusted so as to accept that particular sound as 
valid or correct, either for a particular user, or, in general, for a group of users. 

Additional options of articulatory guidance available to the student are graphical 
and acoustic demonstrations. A further option allows the student to receive visual 
feedback in the form of spectrum and spectrographic real time display, with acoustic 
targets superimposed, (not shown). 

In a preferred embodiment of the present invention, the student is guided towards 
correct pronunciation by real time visual feedback. The analysis required to convert the 
student's rendition of the test phoneme or word is shown schematically in Fig. 4 and 
includes the following steps, all of which are performed in real time, by use of appropriate 
algorithms, as described below. 

It will be appreciated by persons skilled in the art that the system of the present 
invention is operative to enable the provision of feedback to a user, in real time, due to 
the use of novel speech recognition algorithms, as described below in conjunction with 
Fig. 4. 

While the speech algorithms and portions of the technique or technique? 
described below are known in various different fields, the use of speech recognition 
software in order to provide real time, objective speech pronunciation instruction, at the 
phoneme/word/phrase levels, such as in the present invention, is not known, per se, nor 
is it believed to have been considered in the art. 

In particular, the following techniques are provided in the invention, and are 
described in detail hereinbelow, namely: 

1 . Primary Filtration: extraction of features enabling initial exclusion of fundamentally 
incorrect sounds. 

2. Statistical Analysis: filtering procedures for enhancing the relevant parameters, 
exemplified herein as cepstral parameters, and for reducing the weight of those which 
are not. 

3. Secondary Filtration: The use of "clustering pronunciation filters," for filtering out 
mispronunciations. 

4. Continuous Classification Network: Determining location of sounds in relevant 
phonetic space. 
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Primary Filtration 

As a prerequisite to performing the analysis of the invention, a database of 
"correctly pronounced" phonemes and words is collected. As described above, this may 
be changed for the needs of a particular user. 

Accordingly, when detecting a sound, the speech parameters or features thereof 
are extracted (block 44) by the system by using "cepstral" techniques, as described in 
the book entitled "Discrete Time Processing Of Speech Signals," by John R. Deller, John 
G. Proakis, & John H. Hansen, published By MACMILLAN PUBLISHING CO., NY, 1993. 
This includes calculation of the cepstrum, 1st and 2nd cepstral derivatives, determination 
of pitch, energy and zero crossing (i.e. the number of times in a given time period that 
the speech signal crosses a zero level so as to switch between positive and negative 
values and vice versa). These data are used so as to enable a primary filtration of 
fundamentally mispronounced sounds, i.e. those which are adjudged to be out of bounds 
of the defined task. 

If the detected sounds are not rejected based on the above primary filtration, they 
are then subjected to a statistical analysis (block 46), prior to being passed to a 
secondary pronunciation filter (block 48). 

Statistical Analysis 

The statistical analysis includes two main steps and is used to determine the 
number of, and the nature of, the most relevant parameters, and to reduce the 
dimensionality of the system of parameters. 

The first step is in applying previously determined statistical weighting functions, 
thereby to enhance those parameters most relevant to the particular task at hand. These 
parameters are those which are statistically predetermined to have greater relevance to 
the task at hand. 

Subsequently, in a second step, all of the above parameters, regardless of 
weighting, are analyzed by use of Principal Component analysis, also known in the art as 
the Karhunen-Loeve Transform. This analysis provides a new set of parameters, each 
being a linear combination of the previous, weighted parameters, such that, the ranking 
of the new parameters is a function of the variability and thus also of task relevance 
thereof. After obtaining the new set of ranked, weighted parameters, the parameters set 
can be truncated so as to reduce dimensionality thereof, while retaining a number of 
parameters which has been predetermined to be statistically representative of the task 
data. 
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Secondary Filtration 

As known in the art, phonemes can be grouped into major classes which share 
specific features. The secondary filtration stage includes performing of a geometric 
cluster analysis, in order to filter out utterances of individual phonemes that fall outside 
the class to which the particular acoustic production task relates. For example, if the task 
were the correct utterance of various sounds in a particular fricative class, such as /s/, HI, 
and so on, any mispronounced sounds which, by definition, could not be placed in the 
same phoneme class as these aforementioned fricatives, such as a "lateral" Is/, or a 111, 
would be filtered out or rejected. 

Continuous Classification Network 

Subsequently, a non-linear, continuous, classification network (block 50), based 
on such methods as neural network, radial basis function (RBF) sets, or other, is trained 
using the extracted parameters so that it's output will continuously span the relevant 
"phonetic space." The continuous classification network is employed in order to 
determine where exactly the detected sound resides within the relevant phonetic space. 
Referring to the last example, the detected sound, passing the cluster analysis based 
filter, may now be detected to reside anywhere between the /s/ and /f/ sound. The 
targeting device will then be positioned accordingly. 

If the system is being operated in the word/phrase level mode, such that the 
sounds spoken by the student, and being analyzed by the system, are a multiple 
phoneme string, containing a number of subgroups, then video and aural indications are 
provided to the user, indicating the quality or correctness of his pronunciation. 

If, however, the system is being operated in the phonemic level mode, a further 
non-linear transform is used to project the neural network output onto the visual space of 
the display, so as to provide real time visual feedback, as described above in conjunction 
with Fig. 3C. 

It will be appreciated by persons skilled in the art that the scope of the present 
invention is not limited by what has been shown and described above, merely by way of 
example. The scope of the invention is limited, rather, solely by the claims, which follow. 



BNSDOCID: <WO 9913«6A1_I_> 



CLAIMS 

1. A system for teaching speech pronunciation or reading to a student, which 
comprises: 

a memory for storing a plurality of speech portions; 

a playback system, associated with said memory, for indicating to a student a 
speech portion to be practiced; 

an algorithm associated with said memory; 

a speech portion selector for selecting a speech task to be practiced; and 

a sound recorder connected to said algorithm and operative to sense and record 

an utterance by a student, and to provide the utterance to said algorithm in signal form, 
wherein said algorithm is operative to compare the utterance with the speech task 

to be practiced and to evaluate the accuracy of the utterance, and is further operatively 

associated with said playback system, and is operative to cause it to provide to the 

student, an indication of the accuracy of the utterance. 

2. A system according to claim 1, wherein said algorithm is operative to perform 
speech feature extraction so as to quantify spoken sounds in accordance with 
predetermined parameters, and is further operative to enhance parameters statistically 
representative of the selected speech task. 

3. A system according to claim 2, wherein said algorithm is further operative, during 
said speech feature extraction to determine a plurality of speech parameters, including 
the cepstrum, the 1st and 2nd cepstral derivatives, pitch, energy, zero crossing, or any 
other parameter or set of parameters derived from the speech signal. 

4. A system according to claim 3, wherein said algorithm is further operative to apply 
predetermined statistical weighting functions to predetermined speech parameters, 
thereby to enhance said predetermined speech parameters and so as to provide a set of 
weighted parameters. 

5. A system according to claim 4, wherein said algorithm is yet further operative to 
perform principal component analysis for performing a mathematical transform of said 
predetermined parameters, thus providing linear combinations of said weighted 
parameters, and thereby to provide a new set of parameters, ranked in accordance with 
variability and relevance to the speech task. 
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6. A system according to claim 5, wherein said algorithm is further operative to 
truncate said new set of parameters so as to reduce dimensionality thereof, while 
retaining a predetermined number of statistically representative parameters. 

7. A system according to claim 2, wherein said speech portion selector is operative 
to select a phoneme in a selected phoneme class, 

and wherein said algorithm is also operative to perform a cluster analysis so as to 
determine whether or not a phoneme present in the utterance belongs to the selected 
phoneme class. 

8. A system according to claim 3, wherein said algorithm is further operative to 
evaluate the accuracy of an uttered phoneme, in accordance with said cluster analysis, 
by determining the location of the phoneme relative to an associated phonetic space. 

9. A system according to claim 8, wherein said algorithm is operative to determine 
the location of a spoken sound relative to the associated phonetic space, by employing a 
non-linear, continuous classification network for comparing the occurrence of said 
predetermined speech parameters with the extracted speech parameters, operative r to 
provide an output corresponding thereto, and further, by projecting the classification 
network output onto said playback system. 

10. A system according to claim 9, wherein said algorithm operates said playback 
means to again display the inaccurately uttered phoneme for practicing by the students 

11. A system according to claim 10, wherein said sound recorder is operative to 
sense and record a repeated phoneme, and to provide the repeated phoneme to said 
algorithm in signal form for evaluation thereof, and wherein said algorithm is operative to 
evaluate the repeated phoneme for accuracy. 

12. A system according to claim 2, wherein said speech portion selector is operative 
to select a multiple phoneme string, 

and wherein said algorithm is also operative to evaluate whether an uttered 
multiple phoneme string corresponds to the selected multiple phoneme string. 

13. A system according to claim 12, wherein said algorithm is operative to identify 
and evaluate for accuracy subgroups present in the uttered multiple phoneme string. 

13 
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14. A system according to claim 13, wherein said algorithm is operative to cause said 
playback system to provide to the student a sensible indication of the inaccurate 
subgroups in the uttered multiple phoneme string. 

15. A system according to claim 13, wherein said algorithm further determines the 
nature of the inaccuracy of each subgroup and further, in the event that the nature of the 
inaccuracy is determined to be segmental, to cause said playback means to display each 
phoneme of the subgroup for practicing by the student. 

16. A system according to claim 1 , wherein said playback system comprises, 
a visual playback system, and 

an audio playback system, 
and wherein, in response to selection of a selected speech portion by a student, said 
visual playback system is operative to display a visual image indicating the speech 
portion selected, and said audio display means is operative to provide an audible 
indication of the speech portion selected. 

17. A system according to claim 16, wherein said playback system is operative to 
provide in real time a dynamic visual image indicating the accuracy of the utterance. 

18. A system according to claim 1 7, wherein said playback system comprises: 

a display of a movable visual image which is movable between first and second 
locations on said display, wherein said first location is a start location at which said visual 
image is located prior to sensing of a sound by said recorder, and wherein said second 
location is a target location, towards which said playback system is operative to move 
said movable visual image in real time as an indication of the accuracy of the utterance. 

19. A system according to claim 18, wherein the distance between said movable 
visual image and said target location is inversely proportional to the accuracy of the 
utterance. 

20. A system according to claim 1, wherein at least two of said memory, said 
playback system, and said algorithm are located remotely from one another, and are 
connected via a communications link. 
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